Nucleotide mismatches prevent intrinsic self-silencing of hpRNA transgenes to enhance RNAi stability in plants

Hairpin RNA (hpRNA) transgenes are the most successful RNA interference (RNAi) method in plants. Here, we show that hpRNA transgenes are invariably methylated in the inverted-repeat (IR) DNA and the adjacent promoter, causing transcriptional self-silencing. Nucleotide substitutions in the sense sequence, disrupting the IR structure, prevent the intrinsic DNA methylation resulting in more uniform and persistent RNAi. Substituting all cytosine with thymine nucleotides, in a G:U hpRNA design, prevents self-silencing but still allows for the formation of hpRNA due to G:U wobble base-pairing. The G:U design induces effective RNAi in 90–96% of transgenic lines, compared to 57–65% for the traditional hpRNA design. While a traditional hpRNA transgene shows increasing self-silencing from cotyledons to true leaves, its G:U counterpart avoids this and induce RNAi throughout plant growth. Furthermore, siRNAs from G:U and traditional hpRNA show different characteristics and appear to function via different pathways to induce target DNA methylation.

R NA silencing is an evolutionarily conserved gene silencing mechanism in eukaryotes, where long dsRNA is processed by dicer or dicer-like (DCL) proteins into 20-30 nucleotide (nt) small RNA (sRNA) that induces RNA degradation via sequence complementarity [1][2][3] . In plants, multiple RNA silencing pathways exist, including microRNA (miRNA), trans-acting small interfering RNA (tasiRNA), repeat-associated siRNA (rasiRNA), and exogenic (virus and transgene) siRNA (exo-siRNA) pathways 4 . miRNAs are 20-24 nt sRNAs processed in the nucleus by DCL1 from short self-folding RNAs transcribed from MIR genes 2 . tasiRNAs are 21 nt secondary siRNAs derived from DCL4 processing of dsRNA synthesized by RNA-dependent RNA polymerase 6 (RDR6) from miRNA-cleaved TAS RNA fragment 4 . The 24-nt rasiRNAs are generated from repetitive DNA in the genome by the combined function of DNA-dependent RNA polymerase IV (Pol IV), RDR2, and DCL3 5 . The exosiRNA pathway overlaps with the tasiRNA and rasiRNA pathways and both DCL4 and DCL3 are involved in exosiRNA processing. In addition to DCL1, DCL3 and DCL4, plant genomes encode DCL2 or equivalent, which generates 22-nt siRNAs including 22-nt exosiRNAs, and plays a key role in systemic and transitive gene silencing in plants 6 . All of these plant sRNAs are methylated at the 2′-hydroxyl group of the 3′ terminal nucleotide by HUA Enhancer 1 (HEN1), which is thought to stabilize the sRNAs 7 . miRNAs, tasiRNAs and exosiRNAs are functionally similar to sRNAs in animals, and involved in post-transcriptional RNA degradation. rasiRNAs, however, function to direct de novo cytosine methylation at the cognate DNA, a transcriptional gene silencing mechanism known as RNA-directed DNA methylation (RdDM) 5 . The post-transcriptional RNA silencing mechanism has been extensively exploited as a gene knockdown technology in various eukaryotic systems, generally referred to as gene silencing or RNA interference (RNAi) technologies. In plants, the different RNA silencing pathways have led to different technical approaches, such as artificial miRNA, artificial tasiRNA, and virus-induced gene silencing technologies 4 . Long hpRNA transgenes, designed to express long hairpin-structured dsRNA, are the most widely used RNAi technology in plants, and a variety of successful applications of this technology have been demonstrated in plant biotechnology 4 . It can be anticipated that this RNAi approach will continue to be a powerful tool in many areas of crop improvements such as host-induced RNAi against pests and pathogens and metabolic engineering of novel traits through spatial and temporal gene knockdown, which is difficult to achieve using gene knockout technologies such as the CRISPR/ Cas9 approach.
An hpRNA construct typically consists of a perfect invertedrepeat (IR) of a target gene sequence (forming the dsRNA stem of hpRNA) separated by a spacer sequence (forming the loop). Tandem DNA repeats, particularly the IR DNA structures, are widely observed to attract strong DNA methylation causing transcriptional silencing 8,9 . Beside the IR structure, siRNAs derived from hpRNA transgenes can potentially direct DNA methylation to their own sequence via the RdDM pathway 10,11 . hpRNA transgenes, therefore, differ from normal transgenes and are potentially subject to self-induced transcriptional silencing. Indeed, a previous study showed that hpRNA transgene-induced RNAi in Arabidopsis was enhanced in an RdDM mutant and that this enhanced RNAi effect correlated with reduced DNA methylation spanning from the IR DNA to the upstream promoter sequence 12 . An RNAi design that can prevent self-induced silencing would therefore be desirable for achieving durable and potent RNAi in plants.
In this study, we show that introducing nucleotide mismatches to disrupt IR DNA structure results in uniform and persistent RNAi in plants. Our results indicate that the traditional hpRNA transgenes with a perfect IR structure are generally prone to self-induced methylation and transcriptional silencing (referred to as self-silencing hereafter) causing large variability in RNAi efficacy, whereas the enhanced RNAi effect of mismatched hpRNA constructs is due to the prevention of methylation in both the IR and the promoters preventing selfsilencing. In addition, we provide evidence that the IRassociated DNA methylation and self-silencing is not affected in two mutants of the RdDM pathway, and that siRNAs from G:U hpRNA transgenes are processed and function differently from traditional hpRNA transgenes, providing insights into IRinduced gene silencing in plants.

Results
Evenly mismatched and G:U base-paired hpRNA induce uniform RNAi. We first tested three mismatched constructs in Nicotiana tabacum lines PPGH11 and PPGH24 expressing the βglucuronidase (GUS) reporter gene as the RNAi target (Fig. 1a). These constructs contained the same 200 bp antisense wild-type (WT) GUS sequence as the traditional hpRNA construct (hpGUS [WT]) to ensure perfect sequence complementarity between antisense siRNAs and target GUS mRNA. The mismatched construct hpGUS [1:4] had one nucleotide substitution in every 4 nucleotides of the 200 bp sense sequence; hpGUS [2:10] contained 2 consecutive nucleotide substitutions in every 10 nucleotides; and hpGUS[G:U] had all 52 cytosine (C) nucleotides changed to thymine (T) nucleotides ( Fig. 1a; Supplementary  Fig. 1). The C to T changes in hpGUS[G:U] disrupted the perfect IR DNA structure but did not prevent the formation of perfect hpRNA due to G:U wobble base-pairing.
The hpGUS[WT] transgenic population showed a wide range of RNAi efficiency, with 35 of the 59 independent lines analyzed (59.3%) showing strong RNAi (GUS activity ≤10% of the PPGH11 and PPGH24 GUS-expressing control lines), 9 showing weak RNAi (GUS activity 10-30% of the control), and 15 almost no silencing (Table 1 and Fig. 1b), which was typical for traditional hpRNA constructs 13 (Fig. 1b). This uniform RNAi was not due to a uniform transgene insertion pattern across the independent lines: 16 randomly selected GUS-silenced lines showed a wide range of transgene copy numbers ( Supplementary Fig. 2 [1:4] coincided with its dsRNA stem having the lowest predicted thermodynamic stability (Fig. 1c). Consistently, there appeared to be a good correlation between the extent of GUS RNAi and the predicted dsRNA stability of the four hpRNAs (Fig. 1c).
As the G:U hpRNA construct induced strong and uniform RNAi against GUS, we tested this design against two endogenous genes in Arabidopsis, ETHYLENE INSENSITIVE 2 (EIN2) and PHYTOENE DESATURASE (PDS), silencing of which can be scored based on hypocotyl length of dark-germinated seedlings on 1-aminocyclopropane-1-carboxylic acid (ACC) medium 14 and photo-bleaching 15 Fig. 3).
The GUS, EIN2, and PDS RNAi results collectively confirmed that the G:U hpRNA construct induces more uniform RNAi than the traditional hpRNA construct. Importantly, the PDS RNAi result indicated a developmental stage variability of RNAi by the traditional hpRNA transgene, being more effective in cotyledons than leaves, and suggested that the G:U hpRNA transgenes are developmentally more stable.  Hypocotyl length (cm) G:U hpRNA transgenes show diminished promoter methylation. The enhanced uniformity of RNAi by the mismatched hpRNA transgenes suggested a reduced level of self-silencing, which could be due to reduced DNA methylation at the mismatched transgenes compared to the traditional hpRNA transgenes. To investigate this, we analyzed DNA methylation in the hpGUS, hpEIN2, and hpPDS transgenes, using methylationdependent enzyme McrBC-digestion-PCR and bisulfite sequencing.
Many of the hpPDS[WT] lines showed strong PDS silencing in cotyledons but weak PDS silencing in leaves (Fig. 3 Taken together, the methylation analyses indicated that the relatively uniform RNAi of the mismatched hpRNA lines was due to diminished promoter methylation and that the traditional hpRNA transgenes are inherently prone to promoter methylation with all lines having some levels of promoter methylation. The results also suggested that promoter methylation of traditional hpRNA transgenes is developmental stage-dependent. Methylation of hpRNA transgenes is retained in RdDM mutants. It was thought that the methylation in the IR region of a traditional hpRNA transgene is induced by hpRNA-derived siRNAs via the RdDM pathway. Consequently, it was expected that the traditional hpRNA transgenes would lose the methylation in an RdDM mutant resulting in uniform RNAi across transgenic populations. It was also expected that the traditional hpRNA transgenes would induce more effective RNAi than the G:U hpRNA transgenes in RdDM mutants due to stronger dsRNA stability. We investigated these using two Arabidopsis RdDM mutants, nrpd1a-3 (a T-DNA insertion mutant of the upstream siRNA biogenesis factor Pol IV) and ago4-2 (a mutant of the downstream effector AGO4).
The traditional hpRNA constructs, targeting PDS or EIN2, indeed induced uniform RNAi in the two RdDM mutants, with 84~100% of transgenic lines showing RNAi ( Table 2). The white cotyledon-to-green leaf-type of PDS RNAi phenotype of the Col-0  Fig. 2b, S4a for EIN2 silencing phenotypes). b Bisulfite sequencing of the 35S promoter-EIN2 junction region (the top strand only; the two arrows indicate the locations of the forward and reverse bisulfite PCR primers). Note that the 140 bp EIN2 sequence in hpEIN2[G:U] has no cytosines in the top strand so the low levels of signals reflect the background cytosine noises in the sequencing trace files. c Box plot (R version 3.6.0) showing the average cytosine methylation levels of the 35S promoter region in (b). The central horizontal line indicates the median value, the lower and upper borders of the box represent the first and third quartiles, and the outliers are represented by dots. d-f The hpPDS[WT] transgene shows stronger 35S promoter methylation in true leaves than in cotyledons. d, e McrBC-digestion PCR and qPCR of primary T1 transgenic lines. Primary T1 lines were randomly divided into two pools (I and II) and photo-bleached cotyledons from multiple T1 transgenic plants within each pool were collected and combined to generate two DNA samples. Young leaf tissues were also harvested from the same two groups but were divided into four pools (two pools for each of the two cotyledon pools) (Ia, Ib, and IIa, IIb). f McrBC-digestion qPCR of T2 transgenic plants. As T1 plants with strong PDS RNAi did not grow to seed, only two lines each with moderate PDS RNAi were analyzed. For each of the four lines, photobleached cotyledons and the first true leaves that had just emerged were harvested from~25 T2 progeny, and used for DNA extraction and McrBC-digestion qPCR analysis. For qPCR in (e) and (f), three technical replicates were measured for each sample, and data are presented as mean values ± s.d. with the three data points shown as grey dots. Source data are provided as a Source Data file.  Fig. 5a; Supplementary Fig. 6a) (Fig. 6). In contrast, DNA methylation was largely absent in the hpEIN2[G:U] transgenes in all backgrounds (except for nrpd1a/hp[G:U]−13). Similarly, strong methylation in the IR region (the sense PDS sequence) was also retained in the hpPDS[WT] lines in the ago4-2 and nrpd1a-3 mutants (Supplementary Fig. 6b). Thus, strong DNA methylation inside the perfect inverted-repeat DNA, as well as its spread to the upstream promoter, was retained in the two mutants of the RdDM components.
The EIN2 and PDS genomic targets (top strand) also showed DNA methylation but at a much lower level than the IR region of the hpEIN2[WT] and hpPDS [WT] transgenes, particularly at the CHG and CHH sites ( Fig. 6; Supplementary Fig. 6b; Supplementary Fig. 7) (Note that the four CG sites near the 5′ half of the EIN2 target were already heavily methylated in untransformed Col-0, ago4-2, and nrpd1a-3 plants; Supplementary Fig. 8). Unlike the hpRNA transgenes, target gene methylation was clearly reduced in the nrpd1a-3 mutant for the hpRNA[WT] lines, suggesting that Pol IV, the siRNA biogenesis component of the canonical RdDM pathway, is involved in hpRNA-induced target gene methylation. Remarkably, the G:U hpRNA transgenes induced similar levels of non-CG methylation at the target gene loci (top strand) in both the WT Col-0 and nrpd1a-3 backgrounds ( Fig. 6; Supplementary Fig. 6b; Supplementary Fig. 7), suggesting that the G:U hpRNA transgene-induced RdDM was less dependent on Pol IV than the traditional hpRNAinduced RdDM.
The bottom strand of EIN2 genomic target showed strongly reduced DNA methylation than the top strand in the hpEIN2[G:U] plants of Col-0 and nrpd1a-3 backgrounds, particularly of the nrpd1a-3 background (Supplementary Fig. 8). This strand bias suggested that RdDM requires strong sequence complementarity between siRNAs and target DNA and that the sense siRNAs from primary G:U hpRNA transcript were unable to efficiently induce methylation at the target DNA due to nucleotide mismatches with the bottom DNA strand.
Taken together, experiments with the RdDM mutants further indicated that DNA methylation of traditional hpRNA transgenes is intrinsic to the IR DNA structure which persisted in mutants of both the upstream (Pol IV) and downstream (AGO4) RdDM components. This intrinsic methylation prevented the traditional hpRNA transgenes from reaching their full RNAi efficacy, even in these RdDM mutants. However, the increased cross-line uniformity of PDS and EIN2 RNAi in the ago4-2 and nrpd1a mutants suggested that these RdDM components contribute to genomic position or copy number-dependent silencing of hpRNA transgenes.
Traditional and G:U hpRNAs are differently processed. One obvious question was whether G:U base-paired hpRNAs were efficiently processed by Dicer into siRNAs. Northern blot analysis detected abundant siRNAs from the hpEIN2[G:U] (Fig. 7a) and hpGUS[G:U] (Supplementary Fig. 9a) plants. The amount of siRNAs looked more even across the independent G:U hpRNA lines than the traditional hpRNA lines, and showed a good correlation with the extent of RNAi (Supplementary Fig. 4a, b). Thus, the uniform RNAi across independent G:U hpRNA lines could be attributed to relatively even amounts of siRNAs.  Fig. 9a). GUS target mRNA, from a highly expressed transgene, may serve as the template for the production of secondary siRNAs by RDR 16 (Fig. 7a, b; Supplementary Fig. 9a). Small RNA deep sequencing (sRNA-seq) detected no clear shift in siRNA size profiles between the traditional and G:U hpRNA lines, with the 21 nt siRNAs being always the dominant followed generally by the 24 nt or 22-23 nt siRNAs ( Fig. 8; Supplementary Fig. 9b). The antisense siRNAs in the hpEIN2[G:U] lines are less abundant than the sense, G:U modified siRNAs (Fig. 8). It is possible the sense siRNAs were relatively enriched for 5′ U and therefore preferentially loaded to AGO1 resulting in higher abundance than the antisense siRNAs.
The gel mobility difference of siRNAs between traditional and G:U hpRNA plants prompted us to investigate if they possess different chemical modifications at the 5′ and 3′ termini. Plant siRNAs are generally methylated at the 3′ terminus 7 , and in accordance with this, a β-elimination assay indicated that siRNAs from both hpEIN2[WT] and hpEIN2[G:U] were 3′-methylated (Fig. 7c). Dicer-processed sRNAs were assumed to have 5′ monophosphate but in C. elegans many siRNAs are found to possess di-or tri-phosphate which increases gel mobility in denaturing polyacrylamide gels with high acrylamide:bis-acrylamide ratios 18 . Alkaline phosphatase treatment reduced the gel mobility for both hpRNA[WT] and hpRNA[G:U]-derived siRNAs (Fig. 7d), which migrated at more similar positions than without the phosphatase treatment. This raises the possibility that the two siRNA populations may have different 5′ phosphorylation. The siRNA bands of hpEIN2[WT] plants aligned well with the 21 and 24 nt sRNA size markers that were monophosphorylated with radioactive 32 P (Fig. 7a), suggesting that these siRNAs are largely monophosphorylated. The G:U hpRNAderived siRNAs, with faster mobility, could therefore possess 5′ di-or multi-phosphate. This possibility was supported by the northern blot showing that the di-and tri-phophorylated sRNA makers migrated faster than the unphosphorylated or monophosphorylated markers, and that the hpEIN2[G:U]-derived 24nt siRNAs migrated at a closer position to the tri and diphosphorylated 24 nt sRNA markers ( Fig. 7e; Supplementary  Fig. 11). Northern blot hybridization detected high amounts of long dsRNA species in the hpEIN2[G:U] lines but not in the hpEIN2[WT] plants (Fig. 7b), suggesting that the two types of hpRNA are processed differently. The Arabidopsis microRNA miR168 showed a smaller gel mobility shift than the trans-acting siRNA tasiR255 after alkaline phosphatase treatment (Fig. 7d), suggesting that plant endogenous sRNAs may also possess variable 5′ phosphorylation.

Discussion
In this study we showed that the traditional hpRNA transgenes are invariably methylated at the IR DNA structure and the adjacent promoter sequences compromising RNAi efficiency. This widespread intrinsic DNA methylation and self-silencing of hpRNA transgenes were not reported before but is nevertheless unsurprising. IR DNA structures have long been reported to attract DNA methylation that can extend short distance to upstream promoters in plants, and the methylated IR locus can induce homology-dependent trans-methylation of single-copy loci in the genome [19][20][21] . The best-studied IR DNA is the naturally occurring PAI1-PAI4 locus in Arabidopsis ecotype Wassilewskija, which always carries dense DNA methylation independently of its transcriptional activity or RdDM factors 22 . Evidence exists that supports DNA:DNA pairing in IR-induced methylation, but dsRNA and sRNA signals are also suggested to contribute to the methylation, particularly at the homologous trans-methylated non-IR loci 20,23 . Our results showed that strong DNA methylation in the hpRNA transgenes was largely retained in the mutants of both the upstream siRNA biogenesis factor Pol IV (nrpd1a-3) and the downstream effector AGO4 (ago4-2) of RdDM, which seems to support a RdDM-independent DNA:DNA pairing model in IR methylation. However, the increased uniformity of RNAi across transgenic populations in the ago4-2 and nrpd1a-3 mutants by the traditional as well as G:U hpRNA transgenes suggest that hpRNA transgenes, like any type of transgenes, are also subject to insertion pattern or position-dependent transcriptional silencing, and that RdDM plays a key role in this type of transgene silencing.
It is interesting to note that RNAi potency was generally reduced in the nrpd1a-3 mutant compared to wild-type Col-0 and the ago4-2 mutant, as indicated by the uniform but weak photobleaching phenotypes of hpPDS lines and the low amount of hpEIN2-derived siRNAs in the nrpd1a background. The nrpd1a-3 mutant contained a T-DNA insertion with a 35S promoter enhancer sequence that could cause transcriptional silencing to the 35S promoter driving hpRNA expression 24 . However, the detection of relatively high-abundance long dsRNA intermediates as well as siRNAs in all the three nrpd1a/hpEIN2[G:U] lines (Fig. 7b) suggested that substantial trans-inactivation of the 35 S promoter did not occur in the nrpd1a background. It has been proposed previously that Pol IV may use either methylated DNA and/or dsRNA as templates to generate dsRNA and siRNAs 25 . More direct evidence for the dsRNA-templated model came from a study showing that RNA virus-derived siRNAs, with no DNA source, are strongly reduced in a Pol IV mutant 26 . The hpEIN2[G:U] plants accumulated high amounts of long dsRNA species, and the bulk of sense siRNAs had the C to U-modified sequence, indicating that siRNAs were mostly derived from direct Dicer processing of the primary G:U hpRNA transcript independent of Pol IV. For the hpEIN2[WT] transgenes, however, long dsRNA was almost undetectable and there was a strong reduction in siRNA accumulation in the nrpd1a-3 background (Fig. 7b). This raises the possibility that Pol IV may contribute specifically to siRNA production from the traditional hpRNA transgenes using the low amounts of the primary perfect dsRNA as a template. Interestingly, siRNA bands of hpRNA[WT] looked more scattered on the gel blot than those of hpRNA[G:U] (Fig. 7), which implies that hpRNA[WT]-derived siRNAs are a mixture of different biogenesis processes with different size or 5′ phosphorylation hence different gel mobility (e.g. direct Dicer processing of primary hpRNA plus Pol IV-mediated amplification), unlike the G:U hpRNA-derived siRNAs that are largely derived from the primary hpRNA transcript.
The key finding of this study is that C to T substitutions or around 25% nucleotide modifications in the sense DNA sequence prevented the intrinsic methylation of the hpRNA transgenes, resulting in uniform RNAi across independent transgenic lines. The C to T substitutions also prevented the cotyledon to true leaf progression of methylation and self-silencing observed for the hpPDS[WT] transgene, a phenomenon that has not been reported before but has important implications in studying developmental stage-dependent RNAi and transcriptional gene silencing. Thus, disruption of perfect IR DNA structures is sufficient to block IR methylation and self-silencing of hpRNA transgenes. It is interesting to note that microRNA precursors in plants usually contain mismatches or G:U base-pairs in the duplex regions. Considering the results from our study, this structural feature may have evolved to disrupt IR DNA structure preventing transcriptional self-silencing of miRNA genes.
As illustrated by the different GUS RNAi efficacy by the four hpGUS constructs, reduced dsRNA stability due to nucleotide modifications in the sense strand reduces RNAi efficiency presumably because of inefficient Dicer processing. Weak to moderate RNAi can have specific applications, particularly when the target genes are required for plant viability. The potential drawback of reduced RNAi, however, is largely overcome with the G:U hpRNA constructs, where the C-to-T changes in the sense sequence disrupt the IR DNA structure but still allow the formation of perfect hpRNA structure due to G:U wobble basepairing. Consequently, all three G:U hpRNA constructs tested induced strong and uniform RNAi. hpRNAs containing multiple G:U base-pairs (up to 17.5%) has been previously shown to induce RNAi in animals and confer virus resistance in plants 27,28 .  Fig. 2 and Supplementary Fig. 4a for RNAi phenotypes). Asterisks indicate samples for sRNA deep sequencing. b Detection of antisense sRNAs (upper panel) and long dsRNA (lower panel) using the same EIN2 probe as in (a). Ethidium bromide-stained rRNA is used as the loading control. c β-elimination (NaIO4 treatment) assay confirming similar 3′-O-methylation between hpEIN2[WT] and hpEIN2[G:U]-derived siRNAs (If hydroxyls on the 3′-terminal ribose are unmethylated, NaIO4 oxidizes them to form an unstable dialdehyde that leads to β-elimination of the terminal nucleoside and an approximately 2-nt downward mobility shift). d Alkaline phosphatase (CIP) treatment of sRNAs. The same gel blot was sequentially hybridized with the sense EIN2, trans-acting siRNA255 (tasiR255), and miR168 probes. Note that CIP treatment resulted in slowed but more similar siRNA gel migration between the two hpRNA designs. Also, note that tasiR255 showed a greater gel mobility shift than miR168 (see the different gaps between the two short red lines that indicate the average position of the CIP-treated and untreated tasiR255 and miR168 bands. It was unclear why the intensity of tasiR255 band was markedly reduced upon CIP treatment). e Detection of hpEIN2[WT] and hpEIN2[G:U]-derived siRNAs together with 5′ labeled 21 and 24 nt (5′p-GUS21Me, 5′p-GUS24Me) or un-labeled 24 nt (5′ OH-GUS24Me) synthetic GUS sRNA markers (Supplementary Data 1), and in vitro transcribed mono-(5′P), di-(5′PP) and tri-(5′PPP) phosphorylated EIN2 sRNA markers (see Methods). All marker samples were mixed with 2 µg total RNA of WT Col-0 before loading. The un-labeled GUS24 and EIN2 sRNA markers were visualized by hybridization with respective antisense oligonucleotide probes. Note that the gel migration of 24 nt sRNA markers was slowest for the non-phosphorylated 5′OH-GUS24Me and the fastest for the di-and tri-phosphorylated EIN2 markers. The Col/hp[WT] sample looked degraded on this gel but another gel was run to verify the sRNA pattern ( Supplementary Fig. 11b). Source data are provided as a Source Data file.
In our study all cytosines in the sense sequence, constituting 18~26% of the target sequences, were substituted in the G:U hpRNA constructs. Future studies should examine the number of C-to-T substitutions that are required for reducing self-silencing while maximizing RNAi efficiency.
Our study indicated that G:U hpRNA is differently processed compared to the traditional hpRNA. The hpEIN2[G:U] lines accumulated high amounts of distinct large-sized dsRNA intermediates, which were largely absent in the hpEIN2[WT] lines. Furthermore, while the hpEIN2[G:U] plants accumulated similar amounts of siRNAs to the hpEIN2[WT] lines (Fig. 7b), siRNAs from the two hpRNA designs showed different gel mobilities.
Alkaline phosphatase treatment homogenized the gel mobility of the two siRNA populations, raising the possibility that siRNAs from the G:U and traditional possess different 5′ phosphorylation. Similarly, the endogenous miR168 and tasiR255 sRNAs also showed different gel mobility and alkaline phosphatase-caused gel mobility shifts, suggesting that endogenous plant sRNAs could also have different 5′ phosphorylation. Methylation analysis of hpRNA transgenes in the nrpd1a-3 and ago4-2 mutants suggested that the G:U hpRNA-derived siRNAs, unlike those of the traditional hpRNA, induce RdDM through a Pol IV-independent pathway. Thus, G:U hpRNA-derived siRNAs may have distinct functional properties from the traditional hpRNA-derived siRNAs, possibly due to different biogenesis or 5′ modification. Further studies are needed to confirm and understand any difference in chemical modification of traditional and G:U hpRNAderived siRNAs.
In conclusion, our study uncovered a RNAi construct design that overcomes transcriptional self-silencing to induce more uniform and persistent RNAi than the traditional hpRNA design and shed light on IR DNA-induced gene silencing in plants. Apart from theoretical interest, future studies should investigate if G:U-modified and other mismatched hpRNA transgenes also have increased long-term stability inducing effective RNAi in multiple generations, which would be important for field applications of RNAi in crop improvements.

Methods
Plant materials and growth conditions. Plants used in the experiments included Arabidopsis thaliana (ecotype Col-0), and transgenic Nicotiana tabacum Wisconsin 38 lines PPGH11 and PPGH24. These are two independent lines homozygous for the single-copy transgene expressing GUS driven by a promoter from the Cucurbita pepo PP2 gene 29 . The PP2-GUS plants were chosen as the testing plants because the PP2 promoter came from an endogenous gene with a different sequence to the 35S promoter used to drive the expression of the hpRNA transgenes, which therefore would prevent transcriptional silencing of the target GUS gene by promoter trans-inactivation. Plant seeds were sown either directly into soil, or placed first on MS plate for germination followed by transferring seedlings to soil. Plants were grown in a growth room (16 h light/8 h dark) at 22-24°C.
Construct preparation. For preparing GUS hpRNA constructs, the 200 bp GUS ORF sequence (nt. 801-1000 from the translational start codon ATG) was PCRamplified using the oligonucleotide primer pair GUS-WT-F and GUS-WT-R (Supplementary Table 1), containing XhoI and BamHI sites or HindIII and KpnI sites, respectively. PCR fragment was inserted into pGEM-T Easy (Promega Cat No. A1360), the correct nucleotide sequence confirmed by sequencing, and inserted as a BamHI/HindIII fragment into pKannibal 30 forming the 35S-P::PDK intron::antisense GUS::Ocs-T cassette (pMBW606). This plasmid was used as the base vector for assembling the four GUS hpRNA constructs as follows.  Table 1) followed by PCR extension of 3′ ends using the high fidelity LongAmp Taq polymerase (NEB Cat No. M0323S). Nucleotide substitutions in GUS [1:4] and GUS[2:10] followed the following rule: C is changed to G, G to C, A to T and T to A. The PCR fragments were ligated into the pGEM-T Easy vector, the correct nucleotide sequences confirmed by sequencing, and then inserted as a XhoI/KpnI fragment into pMBW606. The resulting 35S promoter::hpRNA::OCS terminator cassette was excised with Not1 and inserted into the NotI site of pART27 31 , forming the four final hpGUS constructs.
For preparing the traditional and G:U base-paired EIN2 and PDS hpRNA constructs, DNA fragments spanning the 200 bp regions of the wild-type EIN2 cDNAs were PCR-amplified from Arabidopsis thaliana Col-0 cDNA using the oligonucleotide primer pairs EIN2wt-F and EIN2wt-R (Supplementary Table 1) and cloned into pGEM-T Easy. The 200 bp C to T modified sense sequence (EIN2[G:U]) was assembled by annealing the overlapping oligonucleotides EIN2-GU-F and EIN2-GU-R (Supplementary Table 1), followed by PCR extension of 3′ ends using LongAmp Taq polymerase, and also cloned into pGEM-T Easy and sequenced. DNA fragments of 450 bp wild-type and C-to-T modified sequences of PDS cDNA (Supplementary Table 2) were synthesized by GeneArt TM .
The 35S-P::sense fragment::PDK intron::antisense fragment::OCS-T cassettes were prepared in the same way as for the hpGUS constructs. Essentially, the wildtype sequences were excised from the respective pGEM-T Easy plasmids by digestion with HindIII and BamHI, and inserted into pKannibal between the BamHI and HindIII sites so they would be in the antisense orientation relative to the 35S promoter. The wild-type or C to T modified fragments were then excised from the respective plasmids using XhoI and KpnI and inserted into the same sites of the respective antisense-containing clone. All of the cassettes in the pKannibal vector were then excised with NotI and inserted into pART27 to form the final binary vectors for plant transformation.
Stable transformation and identification of transgenic lines. All four GUS hpRNA constructs were transformed into the GUS-expressing tobacco lines PPGH11 and PPGH22 using the Agrobacterium-mediated leaf-disk method 32 .
EIN2 and PDS hpRNA constructs were transformed into A. thaliana by the floral dipping method 33 . To select for transgenic Arabidopsis lines, mature seeds were sterilized 34 and spread on MS plates containing 50 µg/mL kanamycin (Sigma Aldrich Cat No. K1377) plus 150 µg/mL timentin (Fisher Scientific Cat No. NC9588884) to inhibit Agrobacterium growth. The phenotype of PDS silencing was recorded for the primary (T1) transformants. The surviving T1 lines of PDS hpRNA constructs, and those of EIN2 hpRNA construct, were transferred to soil, self-fertilized and grown to maturity. Seed collected from these plants (T2 seed) was used to establish T2 plants that were used for further gene silencing, DNA methylation, and transgene segregation analyses.
Analysis of GUS and EIN2 silencing. GUS activity was quantitatively determined using fluorimetric 4-methylumbelliferyl-β-D-glucuronide (Merck Cat No. 89105) (MUG) assay 34 . The relative GUS activity represents the slope value per 5 mg of protein. For T0 plants (the primary transformants), protein used for the MUG assay was extracted from 3 leaves of an individual plant, while for the second generation, the protein was extracted from a pool of multiple (20-50) transgenic plants.
For EIN2 silencing assay, Arabidopsis seeds were sterilized 34 and plated on halfstrength MS salt medium (Sigma Aldrich Cat No. M5519) (without organics) containing 5 mg/L 1-aminocyclopropane-1-carboxylic acid (Sigma Aldrich Cat No. 149101-M) (ACC). The plates were imbibed for 3 days at 4°C in the dark, transferred to 22°C under lights for 10 h to improve germination, and then incubated for 4 days in the dark. Around 10-12 seedlings from each transgenic line, representing the overall hypocotyl length distribution, were selected from the halfstrength MS salt medium and positioned horizontally onto agar plates containing blue stain to visualize hypocotyl length. The hypocotyl length of the seedlings was photographed using a digital camera and measured using ImageJ (http://rsb.info. nih.gov/ij).
DNA and RNA analysis. DNA, small RNA and large RNA from all transgenic tobacco lines were prepared following the phenol extraction method 10 : plant tissues were ground to powder in liquid nitrogen, and suspended quickly in pre-heated phenol:RNA extraction buffer (100 mM LiCl, 1% SDS, 100 mM Tris pH9, 10 mM EDTA) (1:1 ratio). An equal volume of chloroform was added and mixed, the mixture was centrifuged and the supernatant was transferred to a new tube. Lithium chloride was added to the supernatant at a 2 M final concentration, and large RNA precipitated at 4°C overnight. Supernatant from large RNA precipitation was then mixed with 1 volume of isopropanol to precipitate DNA and small RNA. Total RNA from the T2 transgenic Arabidopsis lines was extracted using TRIzol® Reagent (Ambion® USA Thermofisher Cat No. 15596018) according to the manufacturer's instructions. The genomic DNA from the T2 transgenic plants was isolated from plant leaves using a Cetyltrimethyl Ammonium Bromide (CTAB) (Sigma Aldrich cat No. H6269) method 35 .
For Southern blot hybridization , 10 µg of genomic DNA was digested with HindIII enzyme overnight at 37°C, separated in 0.8% agarose gel, and blotted to HyBond-N + membrane (GE Healthcare) 10 . The blot was hybridized with a fulllength octopine synthase (OCS) terminator sequence as probe, which was excised from pART7 31 with BamHI and NotI digestion, gel purified, and radioactively labeled with [α-32 P] dCTP using the DecaLabel DNA Labeling Kit (Thermo Fisher Scientific Cat No. K0622) according to the manufacturer's instructions. The labeled DNA probe was purified using G-25 columns (Bio-strategy Cat No. 27-5325-01).
Sodium periodate treatment of RNA (β-elimination). Treatment of RNA with sodium periodate (NaIO4) (Sigma Aldrich Cat No.71859) was performed according to Ebhardt et al. 36 . In brief, radiolabeled sRNA (0.025 pmol. mixed with 2 µg of total RNA from Col-0) or total RNA of Col-0 or T2 hpEIN2 plants (10 µg) was incubated in 15 µL of 10 mM HEPES (Sigma Aldrich Cat No. H3375), pH 7.0 and 100 mM sodium periodate at 22°C for 10 min. Following this, 15 µL of formamide loading dye (supplemented with 5 mM EDTA) was then added, and the mixture was heated in boiling water for 30 min before loading.
Preparation of differentially phosphorylated sRNA. T7 RNA polymerase transcription of 5′ mono-, di-and tri-phosphorylated 24-nt sRNAs was performed using guanosine monophosphate, guanosine diphosphate, or GTP 18 . Sequences of the DNA oligonucleotides containing T7 promoter and EIN2 sequences are shown in Supplementary Data 1.
Alkaline phosphatase treatment of sRNAs. Total RNA (10~20 µg) was incubated for 2 h at 37°C in 100 µL reaction containing 1× CutSmart Buffer (New England Biolabs) and a total of 140 units of Calf Intestinal alkaline phosphatase (CIP) (New England BioLabs Cat No. M0525) (50, 30, 30, and 30 units were added to the reaction at 0, 30, 60, and 90 min). After incubation, RNA was purified with phenol/chloroform extraction, precipitated with 10 µL 3 M NaOAc and 250 µL of ethanol at −20°C overnight, and dissolved in 10 µL H 2 O. sRNA northern hybridization analysis of the CIP-treated RNA and untreated samples was performed the same way as described above.
McrBC-digestion PCR and bisulphite PCR. Plant genomic DNA (~500 ng) was digested with 30 units of McrBC (NEB Cat No. M0272) in a 50 µL reaction volume at 37°C overnight. For McrBC-minus controls, the same amount of DNA was incubated overnight at 37°C in 50 µL reaction volumes containing the same buffer, but without the McrBC enzyme. 1 µL (50 ng) of digested and undigested DNA of each sample was used to set up PCR reactions using Taq DNA polymerase (NEB Cat No. M0273) along with ThermoPol buffer (NEB). The PCR product was electrophoresed on a 2% agarose gel, stained with ethidium bromide, and visualized by UV illumination.
Bisulfite conversion and purification were performed using the EpiTect Bisulfite kit (QIAGEN Cat No. 59124) following the procedures recommended by the manufacturer. Bisulfite PCR was performed as a nested PCR (two PCR reactions). The primers used in the first and second round PCR was listed in the Supplementary Data 1. The PCR cycles were as follows 37 : 12 min at 94°C followed by 10 cycles of 1 min at 94°C, 2:30 min at 50°C, 1:30 min at 72°C, and 30 cycles with 1 min at 94°C, 1:30 min at 55°C, 1:30 min at 72°C, with a final extension of 10 min at 72°C. The PCR products from the second PCR were purified using QIAquick PCR purification kit (Qiagen Cat No. 28104) following the manufacturer's instructions. Approximately 50-200 ng of purified bisulfite PCR product was sequenced with BigDye Terminator V3.1 premix (Applied Biosystems) using one of the nested primers. Cytosine methylation levels were determined using the following procedure 38 : trace file data of the sequenced PCR products were opened using the BioEdit software (https://bioedit. software.informer.com), exported to Microsoft Excel using the 'Export trace values (tab-delimited text)' feature, and the relative peak heights of cytosines and thymines calculated to indicate the relative degree of methylation at each cytosine location.
Analysis of small RNA sequencing data. Cutadapt version 1.12 (https://cutadapt. readthedocs.io/en/stable/installation.html) was used to trim the adaptor sequences and filter out >35 nt or <18 nt sequences. The clean reads were mapped to reference hpEIN2 and hpGUS sequences, without mismatch, using Bowtie version 1.2.3 (http://bowtie-bio.sourceforge.net/index.shtml). sRNA reads were normalized against total reads including those mapped to the transgenes and Nicotiana or Arabidopsis genomes.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The small RNA sequencing data is accessible via GSE178565. Source data are provided with this paper.